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Abstract 

Background: In previous work, RAF theory has been developed as a tool for making theoretical progress on the 
origin of life question, providing insight into the structure and occurrence of self-sustaining and collectively 
autocatalytic sets within catalytic polymer networks. We present here an extension in which there are two 
"independent" polymer sets, where catalysis occurs within and between the sets, but there are no reactions 
combining polymers from both sets. Such an extension reflects the interaction between nucleic acids and peptides 
observed in modern cells and proposed forms of early life. 

Results: We present theoretical work and simulations which suggest that the occurrence of autocatalytic sets is 
robust to the partitioned structure of the network. We also show that autocatalytic sets remain likely even when the 
molecules in the system are not polymers, and a low level of inhibition is present. Finally, we present a kinetic 
extension which assigns a rate to each reaction in the system, and show that identifying autocatalytic sets within such 
a system is an NP-complete problem. 

Conclusions: Recent experimental work has challenged the necessity of an RNA world by suggesting that 
peptide-nucleic acid interactions occurred early in chemical evolution. The present work indicates that such a 
peptide-RNA world could support the spontaneous development of autocatalytic sets and is thus a feasible 
alternative worthy of investigation. 

Keywords: Origin of life, Peptide-RNA world, Autocatalysis 



Background 

Understanding the origin of life on Earth is an impor- 
tant and fascinating problem [1]. In order to shed light 
on the structure of early replicators and their mechanism 
of formation, various experimental approaches have been 
explored [2-5]. Due to the enormity of the task, experi- 
mental work alone seems unlikely to answer the question, 
and this has motivated several theoretical investigations 
[6-9]. While one goal of theoretical work is to accelerate 
experimental progress (either in top-down construction of 
a minimal cell [10], or the spontaneous formation of a self- 
replicating protocell from abiotic precursor molecules), 
links between theory and experiment have been scarce. 
Naturally, theoretical models are simplifications of real 
chemistry, and while such simplification enables progress, 
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it may limit the conversation between theorists and exper- 
imentalists until the models more accurately reflect the 
complexity of real biochemical systems. 

The combinatorial and stochastic aspects of theoreti- 
cal work on the origin of life mean mathematics has an 
important role to play. The intuitive analogy between sets 
of reacting compounds and directed graphs was the moti- 
vation for Bollobas and Rasmussen's work on directed 
cycles in random graphs [11]. In previous work [8,12-15], 
RAF theory has been developed as an effective tool for 
making progress on theoretical questions about the ori- 
gin of life, based on initial work by Kauffman [7,16]. In 
particular, it appears the emergence of collectively auto- 
catalytic and self-sustaining sets of chemical reactions 
(RAF sets, defined later) is necessary for the origin of life 
to occur. Previous work has investigated the structure of 
such sets and the probability of their formation, leading to 
theoretical and empirical (simulation-based) results. 
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The general ideas behind RAF theory are not unique, 
and there are several related formalisms [6,9,17]. How- 
ever, in some cases, questions within the RAF framework 
have proven tractable while an equivalent question posed 
within an alternative formalism has not, perhaps because 
of the simplicity of the RAF model. On the other hand, 
it has been suggested that such simplicity limits our abil- 
ity to draw conclusions about "real" biochemical systems. 
However, the recent demonstration of the ability of RAF 
theory to link theoretical and experimental results [4,18], 
together with the ongoing development of fresh theoret- 
ical ideas [19], suggests that this framework continues to 
enable progress. 

In this paper, we present a biologically relevant exten- 
sion to the well-studied polymer model, formalising a net- 
work of molecules in which there are two "independent" 
types of polymer, which are able to catalyse each others' 
(and their own) reactions, but cannot combine to form 
hybrid polymers. The motivation for this is the nature of 
the interaction between peptides and nucleic acids in the 
metabolic networks of modern cells. The importance of 
an extension addressing this mutually catalytic arrange- 
ment was highlighted in Kauffman's 1986 paper, in note 
(vii) (p. 14): "An independent then symbiotic coexistence of 
autocatalytic protein sets and template replicative polynu- 
cleotides would obviously be useful in prebiotic evolution" 
(While the present work does not address the templat- 
ing ability of nucleic acids, this aspect has been studied 
previously [14,20]). Moreover, this extension is highly rel- 
evant in the light of recent experimental results from Li 
et al. [5]. In their paper, the authors propose that interac- 
tions between polypeptides and polynucleotides occurred 
very early in chemical evolution, providing an alternative 
to the hypothesis that life began in an RNA World [21]. 
The authors state "The striking reciprocity of proteins and 
RNA in biology is consistent with our proposal: proteins 
exclusively catalyze nucleic acid synthesis; RNA catalyzes 
protein synthesis; and genetic messages are interpreted by 
the small ribosomal subunit, a ribonucleoprotein." The 
reciprocity described here provides a clear motivation 
for theoretical investigation into the properties of these 
"symbiotic" polymer systems. 

We present theoretical results showing that RAF sets are 
just as likely to emerge in such systems as in those pre- 
viously studied [14], and it turns out that the result holds 
even for a more general system in which the molecules are 
not necessarily polymers, a small amount of inhibition is 
allowed, and the amount of catalysis varies freely across 
the reaction network. In previous work, catalysis has been 
assigned randomly with equal probability between each 
molecule and each reaction. The current work shows that 
RAF sets remain highly probable even under heteroge- 
nous catalysis, which is what we might expect to find in 
real biochemical networks. 



As a step toward increased chemical realism, we intro- 
duce the concept of a kinetic chemical reaction system, 
in which every reaction has an associated rate, and all 
molecules are lost via diffusion into the environment at a 
constant rate. We can in principle then search for RAFs 
in the system (as in previous work [8]) with the additional 
requirement that every molecule in the RAF must be pro- 
duced at least as fast as it is used up or diffuses away - we 
call such an RAF a kinetically viable RAF (kRAF). 

Definitions 

We will use the notation of Hordijk and Steel [8]. Consider 
a triple (X, TZ, F), where 

• X = {xi,X2, ■ ■ ■ } is a (finite) set of molecular species 
or molecule types; 

• F C X is a distinguished subset of molecular species 
known as the food set, the set of all species initially 
available in the environment; 

• TZ = {n, ri, . . . } is a (finite) set of chemically allowed 
reactions; 

• Each reaction r e TZ is an ordered pair (A, B), where 
A C X is a multiset of reactants and B C X is a 
multiset of products. We can represent a reaction as 

a\ + «2 H 1- «« — >• b\ + b2 + • • • b m . Note that the 

reactants at are not necessarily distinct, and neither 
are the products bi. Also note that reversible 
reactions can be modelled as two (formally) separate 
reactions (A,B), (B,A) e TZ. 

The triple (X, TZ, F) is therefore a set of molecular species 
together with the reactions that occur between them, 
intuitively visualised as a directed graph. For brevity, we 
will often use the term "molecule" in place of "molec- 
ular species" or "molecule type". We also define p(r) 
to be the set of all distinct reactants of the reaction 
r, and n{r) to be the set of all distinct products of r. 
Then for any subset TZ' of TZ, p(TZ') := Ure-R' *°( r ) anc ^ 
7t(TZ') := [J re -ji' 7t(r). Another useful concept will be 
the support of a reaction r, supp(r) := p{r) U n(r). 
Similarly, supp(7^') := p(TZ') U 7t(TZ') for any subset 
TZ' of TZ. Informally, the support of a set of reactions is 
the set of all molecules consumed or produced by those 
reactions. 

We can equip the triple (X,TZ,F) with a catalysation 
assignment C C X x TZ, where (x, r) e C is understood 
to mean that the molecule x catalyses reaction r. that is, x 
accelerates r but is unchanged by the reaction. A chemical 
reaction system (CRS) is now defined as a triple (X, TZ, F) 
together with a catalysation assignment C. We will denote 
a CRS Q by Q = (X,TZ,F, C). Figure 1 shows an exam- 
ple of a CRS within the binary polymer model, defined by 
Kauffman [7] and well studied by Hordijk and Steel [8]. 
In this model, all molecule types are polymers over a 2- 
letter alphabet, and each reaction is either the ligation of 
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Figure 1 A chemical reaction system. A simple CRS within the 
binary polymer model, where the food set consists of all monomers 
and dimers. The subset f/i , rj, 'i, rs] is the maximal RAF subset (the 
maxRAF), while f/i,/2l. f/3l and {r]j2Ji} are smaller RAFs within the 
maximal RAF known as subRAFs. [r-\j2\ is an example of an irreducible 
RAF, since no proper subset of it is an RAF. A reaction such as f/3), 
which consists of a single autocatalytic reaction with food molecules 
as reactants, is sometimes called a 'trivial RAF'. 



two molecules into a longer polymer, or the cleavage of a 
single molecule into two shorter polymers. 

The final important concept is that of the closure of the 
food set relative to a subset of reactions TZ' C TZ, denoted 
cI-r/ (F) and formally defined as the minimal subset W C 
X which contains F and satisfies p{f) e W =>■ 7T(r) G W 
for all r e TZ' . Informally, cl-^/ (F) is the set of all molecules 
that can be built up from the food set using only reactions 
in TZ' (ignoring catalysis). 

Following [13], we say that a subset TZ' of TZ forms a 
reflexively autocatalytic and food-generated set (an RAF 
set) for Q provided that TZ' is non-empty and that: 

(i) All the reactants of each reaction in TZ' are contained 
in cIk>(F) (food-generated); 

(ii) For each r e TZ' , there exists (x, r) e C such that 
x e c\n'(F) (reflexively autocatalytic). 

We commonly use "F-generated" in place of "food- 
generated" and "RAF" in place of "RAF set". Informally, 
property (i) requires that the reactions in TZ' must be able 
to sustain themselves from the food set alone. Property 
(ii) requires that every reaction in TZ' must be catalysed, 
and furthermore that the catalysts must themselves be 
generated from the food set by that same set of reactions. 

These definitions are intended to capture properties of 
chemical networks that may have been important in the 
emergence of early replicators. Uncatalysed reactions in 
general proceed extremely slowly. We require catalysis so 
that molecules accumulate in concentrations sufficient to 
perform useful biochemical tasks. Otherwise, they would 
diffuse away before being able to play any role in the 



emergence of the first replicator. Moreover, not only do 
catalysts greatly increase the reaction rates, they also lead 
to an equally dramatic reduction in the variance of the 
rate of reactions {c.f. [22], figure six); this last feature 
would seem to be important for obtaining some degree of 
synchronicity in both early and present-day metabolism. 
However, to allow the catalysts to come out of nowhere 
would be begging the question. So in addition, we require 
that the reactions generate their own catalysts from the 
food set (the set of all molecules available in a particular 
environment on early Earth). 

The idea of a set being F-generated requires that no 
molecules are required as reactants before they have been 
produced. A set that fails to be F-generated could never 
have spontaneously built itself up from the molecules 
available on early earth (the food set), which is clearly a 
necessary condition for the development of early replica- 
tors from prebiotic chemistry. Note however that while 
the reflexively-autocatalytic requirement guarantees that 
an RAF set of reactions eventually produces a catalyst 
for every reaction, the definition of F-generated allows a 
reaction to proceed prior to the production of any of its 
catalysts. We consider this to be reasonable (and realistic) 
for the following reason. Reactions can proceed uncatal- 
ysed (albeit at a much lower rate), which may soon lead 
to the production of a catalyst for the reaction, estab- 
lishing a positive feedback loop which quickly increases 
the rate of the reaction (consider the production of the 
molecule 0011 in Figure 1; this molecule is the sole cata- 
lyst for its own production). In previous work [13] we have 
studied a stronger type of autocatalytic set in which a cat- 
alyst must be present before a reactions can progress at all. 
These sets, referred to as constructively autocatalytic and 
F-generated sets (CAFs) have quite different properties to 
RAFs; indeed, they are less likely to appear spontaneously. 

Figure 1 illustrates some ways in which a set can fail 
to be an RAF. The subset \r\,r2,r^,rs,r-]} fails to be 
reflexively autocatalytic (and so fails to be an RAF) since 
r-j is uncatalysed. In the subset {r\, ri, ^"3, rs, r&} all reac- 
tions are catalysed, however the catalyst of rs is outside 
c Mn,r 2 ,r 3 ,rs,re}(F) (the reactions do not collectively gener- 
ate all of their own catalysts), so this subset also fails to 
be reflexively autocatalytic. The subset {r\, ri, r^, r^, rs} is 
reflexively autocatalytic (since every reaction is catalysed, 
and all the catalysts are in cI{ ri> ... jr5 }(F)), but it is not F- 
generated, since the reactant 101 of r^ is not in the closure 
set (it cannot be created from the food set by the reactions 
{r\, . . ., rs}). However, the subset {ri, ri, 7-3, rs} is an RAF. 
In fact, it is the largest RAF in the system, equal to the 
union of all RAFs in the system. Such an RAF is referred 
to as the maximal RAF subset or the maxRAF. 

Given any catalytic reaction system Q = (X, TZ, F, C), 
there is a fast (polynomial-time) algorithm which deter- 
mines whether or not Q contains an RAF, and if so 
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the algorithm constructs the maxRAF [8]. We use this 
algorithm in section "Simulations of partitioned chemical 
reactions systems" to study the emergence of RAFs within 
simulations of the partitioned polymer system, defined in 
the following section. 

Note that the definitions of a CRS and of an RAF do not 
explicitly include consideration of reaction rates or con- 
centrations. Therefore, the RAF formalism cannot address 
the more specific question of whether or not a popula- 
tion of molecules can remain stable enough to catalyze 
its own growth from the food set and growth over time 
to allow reproduction of the set, issues that are obviously 
of interest in an origin of life scenario. For example, an 
RAF might include an exceedingly rare reaction, the rate 
of which could never support the growth of the system, or 
a very fast reaction, which depletes an essential molecule. 
However, this purely algebraic approach has allowed the 
development of several important results that would not 
have been easy to deduce from a more detailed model. 
Nonetheless, once an RAF set is discovered, it can then 
be checked for dynamical stability: previous work [15,18] 
has involved molecular flow simulations of RAF sets using 
the Gillespie algorithm [23]. Also, in section "Kinetic 
RAF framework" we consider an extension of the for- 
mal RAF framework which does take reaction rates into 
account. 

Partitioned polymer system 

All modern life utilises at least two polymers for link- 
ing information to structure and function: nucleic acids 
(DNA/RNA) and peptides. Nucleic acids store and prop- 
agate genetic information, while peptides perform struc- 
tural, catalytic and signalling roles in vivo in the form 
of proteins, enzymes and hormones. The interaction 
between peptides and nucleic acids is fundamental to the 
most important biochemical processes: peptides catal- 
yse the replication of DNA and the synthesis of mRNA 
in transcription; at the ribosome, a combination of pep- 
tides and catalytic RNA molecules (ribozymes) catal- 
yse the translation of mRNA, generating new peptide 
sequences. At the same time, each of these polymers 
catalyse reactions amongst themselves: for example, pro- 
teolytic enzymes catalyse the cleaveage of peptides, and a 
gene (DNA) could be considered to "catalyse" transcrip- 
tion of mRNA by acting as a template (Figure 2). Despite 
the mutual catalytic dependence of nucleic acids and pep- 
tides in living systems, these polymers are independent 
in the sense that there are no "hybrid" polymers contain- 
ing both nucleotide and amino acid monomers 12 . In order 
to formalise these properties we introduce the following 
generalisation of the well studied polymer model [8]. 

Consider a triple (X,1Z,F) within the polymer model. 
Let X, K and F be partitioned as X = {Xi,X 2 }, H = 
[TZitlZz] andF= {Fi,F2}. where 



DNA/RNA peptides 



Figure 2 Reciprocity of peptides and nucleic acids. Schematic 
depicting the mutual catalytic dependence between nucleic acids 
and peptides in living systems, where a dashed arrow from X to Y 
indicates that there exist reactions involving molecules in Y which are 
catalysed by molecules in X. While all possible such arrows are 
present in the diagram, both groups of molecules are "closed" in the 
sense that there are no reactions combining nucleotide and amino 
acid monomers in the same polymer. 



• Xi,X2 are disjoint sets of polymers; 

• Fi C X\ and F 2 C X 2 are disjoint sets of food 
molecules; 

• IZt is a set of ligation and cleavage reactions such that 
suppfT?.;) c X t . 

A partitioned CRS is now defined as a triple (partitioned 
as above) together with a catalysation assignment C. We 
will use the word module to refer to the set of molecules 
Xi together with the associated reactions TZ\, and simi- 
larly for X 2 and H. 2 . Hence, a partitioned CRS consists of 
two modules, and catalysis can occur both within (intra- 
modular) and between (inter-modular) the modules (the 
specific pattern of catalysis will depend on the nature of 
C). Note however that due to the condition supp(7?.;) C 
Xi, there can be no reactions involving molecules from 
both Xi and X 2 . We also allow X\ and X 2 to be sets of 
polymers over different sized monomer alphabets. For 
example, let the size of these alphabets be k\ and k 2 : then 
to model the interaction between a set of peptides (X\) 
and a set of RNA polymers (X 2 ), set k\ = 20, k 2 = 4. 

Figure 3 shows a simple partitioned CRS within the 
binary polymer model. Previous work [8,13] has demon- 
strated that in the standard, unpartitioned polymer model, 
RAFs are highly likely to be present in a CRS, given some 
mild requirements on the level of catalysis. Since the level 
of catalysis may vary across the network in the partitioned 
model, and since the partition makes the underlying struc- 
ture of the reaction network qualitatively different, it is 
not obvious whether RAFs might be more or less likely 
to occur. This question is addressed more generally in 
the next section, where we prove a stronger result which 
is certainly sufficient to show that a partitioned CRS is 
just as likely to contain RAFs as an unpartitioned one. 
We will present the general result, before returning to the 
partitioned model. 
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Figure 3 A partitioned CRS. A partitioned CRS within the binary 
polymer model. One set of molecules is built from the 'square' and 
'circle' monomers, and the other is built from the 'triangle' and 
'hexagon' monomers. The molecules at the bottom of the image 
comprise the food set and give rise to the other molecules via the 
ligation and cleavage reactions r\, . . . , R4. Dashed arrows indicate 
which molecules catalyse which reactions. In this case, the entire CRS 
is an RAF. Note that while there is intra- and inter-modular catalysis, 
there are no reactions involving molecules from both modules. This is 
emphasized by the enclosure of each module within a large circle: of 
course, in real systems, the molecules would be free to mingle. 



Results and discussion 

The probability of RAFs in general catalytic reaction 
systems 

It was shown in [13] that for a CRS within the polymer 
model, the level of catalysis (expected number of reactions 
catalysed per molecule) necessary and sufficient to pro- 
duce RAF sets with a given probability increases linearly 
with n, the maximum length of polymers in the system. 
Here we extend this result to a general CRS in which the 
molecules are not necessarily polymers, and we invoke 
slightly weaker assumptions by allowing the catalysation 
rates to vary between reactions; in a later section this 
approach also allows for a limited degree of inhibition. 

For convenience, we will assume that the set of reactions 
1Z is the disjoint union of two sets 1Z + and TZ~, where 
every reaction in 1Z + is of the form a + b — > c (two 
reactants and one product), and 1Z~ consists entirely of 
the corresponding reverse reactions c — > a + b, so that 
I TZ + 1 = I TZ~ I . We refer to the reactions in 1Z + as 'forward' 
reactions. Thus pairs of corresponding reactions from 7Z + 
and 1Z~ can be considered as a single reversible reaction. 
We will also assume that a molecule catalyses r e 7Z + if 
and only if that molecule also catalyses the corresponding 
r e TZ~ , which reflects the reality of biological catalysis. 
These assumptions can be weakened, but doing so compli- 
cates slightly the statement and proofs of the results that 
follow, and they apply readily to the partitioned system 
that we study, as do the further conditions listed below. 

In our generalised model we make two main assump- 
tions concerning catalysation: 

(CI) The events £(x, r) that molecule x catalyses 

(forward) reaction r are independent across all pairs 
(*, r) e X x 1Z+. 



(C2) For some constant K > 1, the expected number of 
molecular species that catalyse any reaction is at 
most K times the expected number of molecular 
species that catalyse any other reaction. 

Note that (CI) allows different molecule types to catal- 
yse different numbers of reactions in expectation, since 
the probability that molecule type x catalyses reaction r 
can vary according to both;*: and r (in [13] it was assumed 
that the probability of £ (x, r) depends only on x, not on r). 

Before stating the main result of this section, we require 
the following definition. We say that a triple (X,1Z,F) 
has a species stratification if and only if there is a nested 
sequence a.\ c a% c • • • C a m = X such that the fol- 
lowing conditions hold: (i) F = a t for some t < m; (ii) If 
the reaction / — > a + b is in 1Z where / e F then a and b 
are also elements of a t ; (iii) The number of forward reac- 
tions involving any two food molecules as reactants is at 
most some fixed constant M; (iv) if we let X(l) := a\ and 
X(s) := a s — a s -i for s e {2, . . . , m\ then: 

(51) The number of molecules in a s grows no faster than 
geometrically with s. That is, \X(s) | < /c s for some 
fixed k > 1, for all s e {1, . . . , m}; 

(52) Every molecule in X(s) can be constructed from 
molecules in a s _i by a number of forward reactions 
that grows at least linearly with s — 1. More precisely, 
for some fixed v > 0, the following holds: For each 
se [t + 1, . . ., m}, and for all x e X(s) we have: 

\{r eK + :x e jz(r)andp(r) C a(s-l)}\ > v(s-l). 

We now show that for any triple (X, 1Z, F) the proba- 
bility that Q = (X, 1Z, F, C) (where the random assign- 
ment C satisfies (CI) and (C2)) has an RAF (denoted 
P(3RAF forQ)) is, under certain conditions, determined 
by how the average catalysation rate compares to the sim- 
ple ratio of the total number of forward reactions to the 
total number of molecules. 

Let /I be the average expected number of forward reac- 
tions that are catalysed by a molecular species (averaged 
over all molecular species in X). That is: 




The proof of part (a) of the following theorem is pre- 
sented in the Appendix; part (b) follows immediately from 
a stronger result stated later (Theorem 2) and the proof 
of that later result is also in the Appendix. 

Theorem 1. For any triple (X,7Z,F) that has a species 
stratification, consider the random CRS Q = (X, 1Z, F, C) 
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formed by an assignment of catalysation (C) under any 
stochastic process satisfying (CI) and ( C2). 

(a) Iff! < X • then the probability that there exists 
an RAF for Q is at most </>(A), where 

cj)(X) = 1 — (1 — |0 T 0 as X -> 0, and where r is a 
constant dependent only on k and t. 

(b) IfJZ > X ■ then the probability that there exists 
an RAF for Q is at least 1 — i}r(X), where 

'AW = \-ke- vk / K ^ exponentially fast as X —*■ oo. 

The results in section "Simulations of partitioned chem- 
ical reactions systems" show that as the level of catalysis is 
increased past some threshold there is a transition in the 
probability of the existence of RAFs. This is to be expected 
as it is well known in combinatorics that every monotone 
increasing property of subsets of a set has an associated 
threshold function [24]. Consideration of the definitions 
of reflexively autocatalytic and F-generated reveals that 
the RAF property is monotone on the subsets of the set 
of possible catalysis arcs from molecules to reactions in 
a CRS, so the RAF property has a threshold function. In 
the original binary polymer model, the threshold func- 
tion for catalysis is linear in n (the maximal sequence 
length). However, in the more general setting considered 
here, molecules do not come equipped with a intrinsic 
length. Nevertheless, Theorem 1 shows that the ratio of 
'reactions-to-molecules' plays essentially the same role as 
n in a threshold function for the RAF property. 

Remarks 

• The proof of part (b) involves the construction of an 
RAF involving every molecule in X (that is, 
supp(7?/) = X). However, in general, this RAF will 
involve only a subset of the reactions in 7Z + . 

• In general, the definition of a species stratification 
seems rather artificial: while a CRS within the simple 
(unpartitioned) polymer model naturally admits a 
species stratification (since we just let <x s be the set of 
all polymers up to length s), it would be a non-trivial 
exercise to find a species stratification for a CRS with 
molecules that are not polymers. Nevertheless, 
Theorem 1 shows that the molecules in a CRS being 
polymers is sufficient but not necessary, and we will 
see shortly that in the partitioned polymer model a 
species stratification also applies. 

The probability of RAFs in a partitioned CRS 

In light of Theorem 1, in order to show that the same lin- 
ear catalysis requirement that applies for an unpartitioned 
CRS holds for a partitioned one, we need only show that 
a partitioned CRS has a species stratification, and con- 
struct a set C satisfying (CI), (C2). In what follows, we 
will consider a partitioned CRS that satisfies the same 



assumptions that were made in the proof of Theorem 1 
(i.e. 1Z = 1Z + U 1Z~, and corresponding reactions from 
1Z + and 1Z~ are always catalysed together). Also, let a 
molecule r € 1Z + belong to 7Z\ if and only if the corre- 
sponding r e 1Z~ does too, and let TZ^ denote the subset 
of all forward reactions in IZi. Applying similar restric- 
tions to lZ2t we thus consider a partitioned CRS in which 
1Z is the disjoint union of four sets; TZ^, TZ^, 1Z\ and 1Z~^ , 
so that each module consists of an equal number of for- 
ward and reverse reactions together with the associated 
molecules (of course, the modules may contain different 
numbers of reactions to each other). 

In previous work [19], C was often generated by ran- 
domly assigning catalysis as follows: let each element of 
X x 1Z + (and the corresponding element of X x TZ~) be 
included in C with some fixed probability p. When study- 
ing metabolic network data from real organisms, we might 
expect to find that this uniform model does not match 
the observed pattern of catalysis: for example, it might 
be the case that peptides tend to catalyse more reactions 
involving other peptides than reactions involving nucleic 
acids. To allow for this possibility in a partitioned CRS, we 
allow the likelihood of catalysis to vary depending on both 
the nature of the catalyst and the nature of the molecules 
involved in the reaction. Specifically, we define the matrix 
P where, for any molecule x € X; and any reaction r € TZj, 
the probability that x catalyses r (and the correspond- 
ing reverse reaction) is given by the ijth entry of P. For 
example, in a CRS generated using the matrix 

i] 

we would expect to observe around ten times more cataly- 
sis within modules than between them, and twice as much 
catalysis of reactions in IZi by molecules in Xi than of 
reactions in IZi by molecules in X\. 

In what follows, consider a partitioned CRS Q = 
(X,1Z,C,F) which is complete: that is, both X\ and X2 
contain every possible polymer up to length n\ and 
«2 (respectively), and 1Z\ (respectively TZ-i) contains 
every possible forward and reverse reaction between the 
molecules inXi (respectively X2). Let Fi (respectively F2) 
be all the molecules in X\ (respectively Xi) up to some 
length t < min{«i,«2}- Finally, for a molecule* € X, let \x\ 
denote the length of x (i.e. the number of monomer units 
in x). 

ForXi, define the stratification 

«1 C ce 2 C • • • C ct t C • • • C a ni = Xi 

where a s consists of all the molecules in X\ such that 1 < 
|*| < s. It will prove useful to define Xi(l) := a\ and for 
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5 € [2, . . .,«i}, Xi(s) := a s — a s _i. Similarly for X 2 , define 
the stratification 

ft c ft c • • • c ft C • • • C fa = x 2 

and let X2(s) be defined similarly to Xi(s). Note that 
\Xi(s)\ = kj. Defining « min := min{«i,« 2 } and « max := 
max{«i, «2i> these stratifications are combined into a sin- 
gle stratification yi, yi ■ ■ ■ > of the set X as follows: 

• for 1 < s < « min , y s := a s U ft; 

• for « min < s < « max , 

a s , if «i > « 2 

Xs := ■ 

ft, if « 2 > Hi 

Note that F = yt, which is condition (i) in the defini- 
tion of a species stratification; conditions (ii) and (iii) also 
clearly hold (with M = 2 for condition (iii)), so it remains 
to establish condition (iv), namely that the stratification 
satisfies (SI) and (S2). Define X(l) := y(l) and for s e 
{2, . . . , n max },X(s) := y(s) — y (s— 1), and consider the size 
of each set X(s). Since \X(s)\ does not exceed k{ + k 2 for 
any value of s, {k\ + k 2 ) s is strictly greater than \X{s)\ for 
all s e {1, . . . , Wmaxi. so the partitioned CRS satisfies (SI). 
To see that it also satisfies (S2), we need only note that for 
any molecule type x € X(s) where s e [t + 1, . . . , « max }, 
|*| = 5, so there are a maximum of s — 1 ways x could 
be constructed from shorter molecule types (i.e. molecule 
types in y s -\). Since H\ and H 2 are both complete, every 
such reaction exists and there are in fact precisely s— 1 for- 
ward reactions generating x from y s -i, so take v = 1. We 
conclude that the complete partitioned CRS has a species 
stratification. 

It remains to show that the catalysation assignment C 
described above satisfies (CI), (C2). For each pair (x, r) e 
X x 1Z, the probability that x catalyses r (and the corre- 
sponding reverse reaction) is dependent only on which 
module x and r belong to, so (CI) clearly holds. The fol- 
lowing expression gives the expected number of species 
that catalyse any given reaction: 



"l*il" 




Pn P21 ~ 




~\Xi\~ 




C\ 


\x 2 \_ 




_Pl2 P22 _ 




. I*2l . 




_ C 2_ 



where c; is the expected number of species in X that catal- 
yse any given reaction in IZt. Noting that e [0, 1] and 
that 

Hi K 2 

\K\ = \K 1 \ + \K 2 \=Y I K + Y i ^v 

s=l s=l 

clearly c\, c 2 are finite. Hence taking K= max{ci/c2, c 2 /c\ } 
shows that (C2) holds also. We conclude that Theorem 1 
applies to a partitioned CRS. 



Simulations of partitioned chemical reactions systems 

Previous simulations of chemical reaction systems [8,14] 
have focussed on those which are complete (X con- 
tains every molecule up to some maximum length «, 
and 1Z contains every possible cleavage/ligation reaction 
between the molecules of X) and those in which catal- 
ysis is assigned randomly such that every molecule has 
the same fixed probability of catalysing any reaction. In 
[13,14], it was shown both theoretically and computation- 
ally that in a 'classic' CRS with only one module, the level 
of catalysis (expected number of reactions catalysed per 
molecule) necessary and sufficient to generate RAFs with 
a given probability (e.g. 0.5) increases linearly with n. Fur- 
thermore, simulations show that the linear relationship is 
not steep: when n = 10, the required level of catalysis is 
around 1.29, and when n = 20, the required level of catal- 
ysis increases only to 1.48 [14]. Based on the finding that 
many enzymes catalyse multiple reactions [25], and the 
results of a recent search for RAF sets in the metabolic 
network of E. coli (Sousa FL, Hordijk W, Steel M, Martin 
W: Autocatalytic sets in the metabolic network of E. coli, 
in preparation)., this level of catalysis appears to be biolog- 
ically feasible. Hence, the above results suggest that RAFs 
might be expected for real biochemical polymer networks, 
even under a random assignment of catalysis. 

Theorem 1 assures us that the linear increase in the 
required level of catalysis seen in the original model also 
applies to the partitioned model. However, it is not obvi- 
ous whether or not the same realistic level of catalysis will 
be seen in the latter. In particular, because the partitioned 
model is highly flexible in terms of possible patterns of 
catalysis (forms of the matrix P), it is interesting to ask 
how the pattern of catalysis affects the probability of RAF 
formation. In order to address this question, we simulated 
a partitioned CRS in which each module is complete, with 
k\ = A/2 = 2, n = 10, and the food set consists of all 
monomers and dimers. This CRS was simulated under 
three different catalytic assignments (Figure 4). In order 
to isolate the effect of the pattern of catalysis on the prob- 
ability of RAF formation, the overall level of catalysis is 
constant across all three scenarios for any given value of 
p. We generated 500 instances of each model at a range of 
values of p (corresponding to a range of levels of cataly- 
sis) and searched for RAFs using the algorithm from [8]. 
Analyses were performed on an IBM Power755 cluster 
comprising 13 nodes, each with 32 CPUs running Linux 
11.1 (a total of 416 CPUs). 

Figure 5 shows, at each level of catalysis, the fraction 
of the 500 instances which were found to contain an 
RAF, for each of the three models investigated. All three 
models display a sharp transition in the probability of 
RAF formation as the level of catalysis increases, famil- 
iar from simulations of classic CRSs [8]. The uniform and 
inter-modular models display apparently identical results. 
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Figure 4 Three models of catalysis in a partitioned CRS. Visual representations of the three systems investigated via simulations: inter-modular 
(left), intra-modular (centre) and uniform (right). Labelled arrows from X, toXj indicate that molecules in X, catalyse reactions involving molecules 
from Xj, with some non-zero probability (given by the label). The catalysis matrix P, which depends on the parameter p, is shown for each system. It 
can readily be shown that the overall level of catalysis (expected number of reactions catalysed per molecule) is the same in all three scenarios. 



The level of catalysis required to give 50% probability of 
RAF formation in these models («a 1.3) is slightly higher 
than that in the intra-modular model 1.25), indicating 
that during this transition, RAFs are slightly more likely 
in a CRS with only intra-modular catalysis than a CRS 
with some catalysis between modules. When the level of 
catalysis is 1.29, around 75% of instances of the intra- 
modular model contain an RAF, which is to be expected: 
the same level of catalysis in an unpartitioned CRS with 
n = 10 gives 50% probability of RAF formation [14], and 
since here the intra-modular model essentially consists 
of two independent copies of the unpartitioned CRS, the 
probability of finding an RAF is 1 - (1 - 0.5) 2 = 0.75. 

Figure 5 also shows that, as the level of catalysis 
is increased past the transition level, the fraction of 
instances containing an RAF in the uniform and inter- 
modular models approaches 100% slower than in the 
intra-modular model. However, by the time the catalysis 
level has reached 1.7, all three models produce RAFs close 
to 100% of the time. These results indicate that the pattern 
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Figure 5 Emergence of RAFs in a partitioned system with « = 10. 

Plot showing how the proportion of CRSs containing an RAF depends 
on the level of catalysis, for each of the three models. The maximum 
length of polymers (n) is 1 0 and the food set consists of all monomers 
and dimers. The fraction of CRSs containing an RAF is from 500 
instances of the model. 



of emergence of RAFs in partitioned chemical reaction 
systems is very similar to that in previously studied sys- 
tems. Moreover, it is clear that even under widely varying 
patterns of catalysis, partitioned systems develop RAFs 
with high probability. 

The uniqueness of the results from the intra-modular 
model suggests that the property unique to this model - 
the complete absence of inter-modular catalysis - has a 
discrete effect on the probability of RAF formation. Note 
that the uniform and inter-modular models both have 
inter-modular catalysis, but the latter has twice the level 
of the former, as well as a lack of intra-modular catalysis. 
Despite these difference, their results appear to be identi- 
cal. Taken together, these results suggest that the presence 
or absence of inter-modular catalysis has more of an effect 
on the probability of RAF formation than the actual level 
of inter-modular catalysis. 

Despite the overall similarities in the pattern of emer- 
gence of RAFs between all three models, Figure 6 shows 
that the dependence of the size of the maxRAF on the 
level of catalysis is qualitatively different depending on 
the pattern of catalysis. At low catalysis levels, all three 
models tend to contain only RAFs consisting of a sin- 
gle reaction, catalysed by one of its own reactants or 
products. At the threshold level of catalysis at which all 
3 models begin to develop RAFs with higher probabil- 
ity, the number of reactions contained in the maxRAF 
in the intra-modular model increases faster than in the 
other models (which again display very similar results). 
However, after a short delay, the number of reactions in 
the latter models rapidly increases, matching the equiva- 
lent value in the intra-modular model and then exceeding 
it. As the level of catalysis is increased further past the 
transition point, the rate of growth in the uniform and 
inter-modular models gradually decreases again, and all 
3 models appear to converge on the same values of the 
average maxRAF size. This asymptotic behaviour makes 
sense: at higher levels of catalysis, the module to which 
any particular catalyst belongs has less bearing on whether 
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Figure 6 Size of the maxRAF in a partitioned system with n = 10. Plots showing how the average number of reactions and number of 
molecules in the maxRAF change as the level of catalysis moves through the transition range, expressed as a proportion of the total number of 
reactions and molecules in the CRS. Averages are taken over 500 instances of each model where n = 10 and the food set consists of all monomers 
and dimers. Instances containing no maxRAF were excluded from the calculation of the average, hence the data points that appear close to zero 
indicate a small but non-zero average size. 



or not the reaction in question is part of an RAF set, as 
the network becomes "saturated" with catalysis. Figure 6 
also shows how the average number of molecules con- 
tained in the maxRAF (expressed as a proportion of all the 
molecules in X) depends on the level of catalysis (more 
formally this is \c\-nt (F)\ / \X\, where TV is the maxRAF). 
The pattern of growth is similar to that seen in the num- 
ber of reactions. However, one important contrast is that, 
while the maxRAF quickly grows to involve the majority 
of the molecules in X, at a given level of catalysis it con- 
tains only a relatively small proportion of the reactions 
in 1Z. Thus as the level of catalysis is increased beyond 
that shown in Figure 6, we should expect the average pro- 
portion of molecules involved in the maxRAF to quickly 
approach 1.0, while the average proportion of reactions 
in the maxRAF continues to increase linearly. Not until 
a much higher level of catalysis will the maxRAF contain 
100% of the reactions in the system. These results make 
sense, since the number of possible reactions in a polymer 
system is 0(n2 n ), while the number of molecules is 0(2"). 

Overall, the above results show that when n = 10, a 
partitioned CRS behaves very similarly to a classic CRS 
in terms of RAF emergence. In order to address the ques- 
tion of whether this is true for general values of n, we 
repeated the experiments at n = 15 (Figures 7 and 8). For 
each of the three models, the level of catalysis required 
to attain a given probability of RAF formation is higher, 
which is to be expected given previous theoretical and 
experimental work on the original unpartitioned model. 
However, while the intra-modular model undergoes a 
sharp transition similar to the n = 10 case, both the uni- 
form and inter-modular models undergo a more gradual 
increase in the probability of RAF formation as the level 
of catalysis increases from around 1.3 up to 2.0 (Figure 7). 
Once again, the latter two models exhibit almost iden- 
tical results, which is surprising given the difference in 



their pattern of catalysis. The level of catalysis required to 
give a 50% probability of RAF formation in the uniform 
and inter-modular models has increased from around 1.3 
(n = 10) to 1.45 (« = 15), while the increase in the 
same figure for the intra-modular model is smaller, going 
from around 1.25 to around 1.32. However, the increased 
level of catalysis in the uniform and inter-modular mod- 
els remains chemically realistic. Figure 7 also suggests that 
as the level of catalysis is further increased, the fraction of 
CRSs containing an RAF for these models will approach 1 
monotonically, as observed for n = 10 (Figure 5). 

Figure 8 shows how the average size of the maxRAF (in 
terms of number of reactions, and number of molecules) 
changes as the level of catalysis is increased. Whereas for 
« = 10 the maxRAF initially grew most quickly for the 
intra-modular model, these plots do not show the same 
early growth spurt: instead, all models appear to begin the 
transition at around the same level of catalysis. It is pos- 
sible that the resolution was not high enough to detect 
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Figure 7 Emergence of RAFs in a partitioned system with « = 15. 

Plot showing how the proportion of CRSs containing an RAF depends 
on the level of catalysis, when the maximum length of polymers in 
the system (n) is 15. The fraction of CRSs containing an RAF is from at 
least 120 instances of the model. 
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Figure 8 Size of the maxRAF in a partitioned system with n = 15. Plots showing how the average number of reactions and average number of 
molecules in the maxRAF change as the level of catalysis is increased, expressed as a proportion of the total number of reactions and molecules in 
the CRS when n = 1 5. Instances containing no maxRAF were excluded from the calculation of the average. 



the phenomenon: simulating smaller increments in p and 
a greater number of instances around this transition zone 
may reveal that it still occurs at n = 15. Other than this, 
the plots are similar to those in Figure 6. RAF sets in the 
uniform and inter-modular models grow faster both in 
number of reactions and number of molecules, and not 
until a level of catalysis around 2.0 does the intra-modular 
model catch up. This is much later than in the n = 10 case, 
which is particularly interesting given that at this level of 
catalysis the intra-modular model is developing R AFs with 
higher probability than the other models (Figure 7). 

Discussion 

We chose here to investigate only the cases when n = 10 
and n = 15, since computational constraints limit the 
feasibility of repeating the experiments for more and/or 
larger values of n. However, inferences can be made about 
other values of n, especially in the light of Theorem 1, 
which shows that a linear increase (with n) in the level of 
catalysis is necessary and sufficient to maintain RAFs with 
a given probability in a partitioned CRS. After producing 
similar results to the above for further values of n, it would 
be interesting to use least squares regression to explicitly 
express the linear dependence (on «) of the level of catal- 
ysis required to give 50% probability of RAF formation for 
various patterns of catalysis, and compare these with the 
linear formulae produced in [14] for the original model. 
Based of Figures 5 and 7, we expect to see a steeper rela- 
tionship for the uniform and inter-modular models than 
for the intra-modular model. 

While all three models begin to develop RAFs with high 
probability above the threshold level of catalysis, it is clear 
that the intra-modular model develops RAFs somewhat 
more reliably (with higher probability at lower catalysis 
levels) than the other models. Furthermore, this differ- 
ence is more apparent at n = 15 than n = 10, and in 
the light of the result of Theorem 1, the difference looks 
likely to become more marked as n increases. On the other 



hand, as pointed out by philosopher Roger White [26], 
the probability of a mechanism proposed to play a role in 
the origin of life may not be a sound metric by which to 
judge the validity of that mechanism (Elliott Sober makes 
a related argument in response to Richard Dawkins in [27] 
pp. 50-51). In terms of RAF theory, this means that the 
probability of RAF formation might not be the best way to 
decide which models have the most potential to shed light 
on the origin of life question. 

However, the results show another difference between 
the models that is worth noting. Figures 6 and 8 both 
suggest that the size of RAF sets is significantly lower in 
the intra-modular model than in the uniform and inter- 
modular models (excluding the brief window immediately 
around the threshold level of catalysis in which RAFs 
in the intra-modular model grow faster at « = 10). 
This larger size of RAF sets in the uniform and inter- 
modular models is interesting: since RAF sets can often 
be decomposed into constituent RAFs (subRAFs), larger 
RAFs are likely to contain more of these autocatalytic 
subsets. It was suggested in [28,29] that this modular 
structure might be important for the potential evolvabil- 
ity of RAF sets. Specifically, the ability of large RAF sets 
to gain and lose smaller subRAFs might be a mecha- 
nism by which RAF sets can evolve and compete with 
each other, a process which might favour characteristic 
combinations of subRAFs, in a primitive form of selec- 
tion. This transition from a purely self-replicating set 
of molecules to a complex autocatalytic set which repli- 
cates imperfectly while remaining robust to changes in 
the environment is essential, if RAF sets are to give rise 
to a replicator capable of gradual, open-ended Darwinian 
evolution. 

We have investigated three different patterns of catal- 
ysis. Due to the inherent flexibility of the partitioned 
model, there are various other qualitatively different 
patterns that could be explored. In each of the above 
systems, the catalysis matrix P is symmetric. Even with 
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this restriction in place, there is a continuum between 
exclusively inter-modular and exclusively intra-modular 
catalysis, and we examined only the middle point and 
the two extremes of that continuum here. We expect to 
observe a similar pattern of RAF emergence in other sys- 
tems, where both intra- and inter-modular catalysis occur, 
but not in equal amounts. Based on Figures 5 and 7, if we 
were to begin with an exclusively intra-modular system 
(P=£>(i<j ) )) and gradually increase the level of inter- 
module catalysis (the off-diagonal entries of P), while 
holding the overall level of catalysis constant, we should 
expect to see a shift in the pattern of RAF development, 
becoming more like the uniform and inter-modular mod- 
els examined here. This change should be complete by 
the time the catalysis becomes uniform, so must occur 
somewhere between 'intra-modular' (P = ( q i )) and 
'uniform' (P = f(ii))- It would be interesting to deter- 
mine at what point this transition occurs, and how sharp 
it is. A further extension would be to investigate systems 
in which P is not symmetric: for example, where one 
module dominates as a source of catalysts for the system 
(e.g.P = ( \\ )). Given the main motivation behind this 
investigation, and the observation that peptides appear to 
be far more catalytically active than nucleic acids [25], this 
particular extension seems highly relevant. 

Based on structural complementarity between polypep- 
tide and RNA helices [30] and more recent experimental 
work demonstrating high catalytic proficiency of ances- 
trally related primitive forms of enzymes involved in 
translation [5,31,32], Carter and colleagues have sug- 
gested that the interactions between polypeptides and 
RNA may have played a key role in early chemical evo- 
lution in a "peptide-RNA world". Our theoretical results 
show that a system with two different types of polymer 
with reciprocity of function similar to that of proteins and 
RNA, produces autocatalytic sets at similarly realistic lev- 
els of catalysis to a simpler system composed of a single 
type of polymer (such as an RNA-world or system of pep- 
tides). Therefore, the results presented here suggest that 
the alternative scenario proposed by Carter and colleagues 
is feasible. 

Extensions: closure, inhibition and reaction rates 

The current definition of an RAF is limited because it 
ignores inhibition and reaction rates. The latter is prob- 
lematic because those reactions generating required reac- 
tants which proceed too slowly, or those which use up 
required reactants and proceed too fast, may prevent an 
RAF set from persisting in a dynamic environment. While 
the lack of inhibition and kinetics may be seen as a severe 
restriction, it is useful because it allows us to compute 
RAFs in polynomial time. These RAFs could then be 
examined to test if they are viable given known inhibition 
or reaction rate data. 



Alternatively, we could build this into the definition of a 
stronger type of RAF and ask if there is an efficient algo- 
rithm to find them. In this section we explore the latter 
approach. We consider RAFs that are viable under reac- 
tion rates and show that determining whether or not they 
exist in an arbitrary catalytic reaction system turns out to 
be NP-complete. 

Consideration of these factors (inhibition and reac- 
tion rates) requires distinguishing between RAFs that are 
'closed' and those that are not (this distinction is not 
important in the absence of inhibition and dynamics). 
Thus we first introduce and discuss this property, before 
considering the definition and properties of RAFs that 
allow inhibition or reaction rates. 

Closed RAFs 

Given a CRS Q = (X, TZ, C, F), a subset TZ' of TZ is a closed 
RAF if and only if the following conditions hold: 

1) TZ' is an RAF; 

2) for every r e TZ for which there is a pair (x, r) e C 
such that {x} U p(r) c d n >(F), r e TZ'. 

Informally, a closed RAF captures the idea that "any 
reaction that can occur, will occur". If all the reactants 
and at least one catalyst of a reaction r e 1Z are gener- 
ated by the reactions in 1Z' , then it seems reasonable to 
expect that the reaction r will occur, and so we should 
expect that r is included in TZ' . If r is not included, then 
it is natural to consider adding it to TZ' , in order that 
the extended set TZ' U {r} comes closer to containing 
all the reactions for which it generates all the necessary 
molecules. In order to formalise this notion, we intro- 
duce the idea of the closure of an RAF, defined as the 
smallest closed RAF which contains the RAF. Given an 
RAF TZ', we can construct its closure TZ' as follows: let 
TZ' = Ko, and let Ki+\ = Kt U L„ where i; is the set 
of all r € TZ \ Kj such that there exists a pair (x, r) e 
C and {x} U p(r) C clj^F). Then, TZ' is the final set 
K„ in the sequence of nested sets TZ' = Ko C K\ C 
7<2 C • • • C K„, where n is the first value of i for which 
K = K i+1 . 

_Note that an RAF TZ' is a closed RAF if and only if TZ' = 
TZ'. Note also that while the union of two RAFs is also an 
RAF, the union of two closed RAFs is not necessarily a 
closed RAF (though it is an RAF). 

One notable property of closed RAFs is that, unlike 
RAFs that are not closed, we can reconstruct the network 
of reactions given only a "list" of the molecules involved in 
the network, as follows. 

Lemma 1. A closed RAFTZ' c TZ is determined entirely 
by the subset of molecules F U supp(7\!/) and the CRS Q = 
(X,TZ,C,F). 
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Proof. Consider the set 72* reconstructed from F U 
supp(72') as follows: 

1) Add to 72* every reaction r e 72 for which 
supp(r) c F U supp(72'). 

2) Remove from 72.* any reaction r for which there does 
not exist an x e F U supp(72') such that r) € C. 

We will show that 72* = TZ' by establishing the set 
inclusions TZ' C TZ* and 72* C 72'. First, consider some 
r € 72'. Clearly supp(r) C supp(72'). It remains to show 
that there exists some x € FU supp(72') such that (x, r) e 
C. Since 72' is an RAF for (Q, F), by Lemma 4.3 of [8], 
c1r/(F) = F U 7t(72'), which together with the definition 
of F-generated and of the support implies that 

FUsupp(72') = cl w (f)- (1) 

By definition of reflexively autocatalytic, it follows from 
(1) that for all r e 72', there exists x e F U supp(72') 
such that (x, r) e C. Therefore every reaction in 72' fits 
the criteria for inclusion in 72*, and we conclude that 
72' c 72*. 

Next consider some r e 72*. Then by the rules of con- 
struction of 72*, supp(r) c F U supp(72') and there exists 
an* e FUsupp(72') such that (x, r) e C. By (1), such an* is 
in c\iz'(F), and also by (1), supp(r) C c\-r,'(F) so certainly 
p(r) c cl K /(F). Then since 72/ is a closed RAF, r e 72' 
by definition. We conclude that 72* C 72', which together 
with the previous result proves that 72* = 72'. □ 

Corollary 1. If TZ' is an RAF, then given only F U 
supp(72') and the CRS Q = (X, TZ, C, F), we can construct 
its closure TZ'. 

Proof. If 72' is a closed RAF, then TZ' = TZ 7 so the 
assertion holds trivially by the previous lemma. 

Hence suppose 72' is not closed. Then there is at least 
one reaction r* e 72 \ 72' such that there exists a pair 
(x,r*) e C and {x} U p(r*) e cl n i(F). Construct the set 
of reactions 72* from F U supp(72') (as in Lemma 1). Since 
we did not use the fact that the RAF was closed in the 
first part of the proof of the lemma, we can apply the same 
argument to see that 72' c 72*. 

Now consider some r e 72*. Then by the rules of con- 
struction of 72*, supp(r) c F U supp(72'), and there exists 
some x e F U supp(72') such that (x,r) e C. Then by 
Equation (1) in the proof of Lemma 1 (again, this applies 
since we did not assume the RAF was closed in that part 
of the proof), 72* contains every r* e 72 \ 72' such that 
there exists a pair (x,r) e C and {x} U p(r*) € cl K >(F). 
At this point, we identify 72' with the set Kq and 72* 
with the set K\ = Kq U in described in the preamble to 
Lemma 1. We can then follow the same process described 
in the preamble, constructing a sequence of nested sets 



72' = Kq C • • • C K„, where K n is by definition equal 
to 727. □ 

Inhibition 

In order to discuss the impact of molecules inhibiting 
reactions, we begin with the following definitions. 

Given a CRS Q = (X, TZ, F, C) an inhibition assignment 
is a subset I of X x 72 where (x, r) e I means that molecular 
species x inhibits reaction r. We say that a subset 72' of 72 
is an /-viable RAF for Q if and only if all of the following 
hold: 

(a) 72' is an RAF for Q; 

(b) 72' is closed; 

(c) No reaction in 72' is inhibited by any molecule in 

The motivation for insisting that 72' be closed is as fol- 
lows: Suppose that 72' involves a reaction that is inhibited 
by some product x' of a reaction r' that is not in 72'. Now if 
the reactants, and at least one catalyst of r 1 are present as 
products of reactions in 72' (or elements of F) then there 
is no reason for r' not to proceed and for x' not to be pro- 
duced. In that case 72' U {/}, and any set containing it, 
would no longer be an RAF. 

The concept of an RAF subject to inhibition was formal- 
ized and studied briefly in [13], but there condition (b) was 
not imposed. This paper established that the problem of 
determining whether or not a CRS contains an RAF that 
is /-viable for Q is an NP-complete problem. It is perti- 
nent therefore to ask whether the addition of condition 
(b) alters this result, or affects the proof. In fact, it can be 
shown that it does not, since the reduction in [13] involves 
the construction of an RAF that is automatically closed. 

It is also of interest to know how inhibition affects 
the probability of forming a viable RAF, when / is a 
random assignment. Notice that inhibition is a much 
stronger notion than catalysation - since if a reaction 
is inhibited by just one molecule, then no matter how 
many molecules might catalyse that reaction, it is pre- 
vented from taking place. Thus we might expect that 
even low rates of inhibition could be a major obstruc- 
tion to the formation of a viable RAF. However, we show 
here that provided the inhibition rate is sufficiently small, 
Theorem 2 still holds. To state this we first formalize 
the model by extending (CI) and (C2) to the following 
three conditions (which reduce to (CI) and (C2) upon 
setting e = 0). 

(CI) The events £ (x, r) that x catalyses reaction r, and 
the events F(x, r) that x inhibits reaction r are 
independent across all pairs (x, rjinlx 72 + . 

(C2) As stated previously near the start of section "The 
probability of RAFs in general catalytic reaction 
systems". 
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(C3) For some constant e > 0, the expected number of 
molecular species that inhibit any given reaction is 
at most e. 

Notice that part (a) of Theorem 1 applies automatically 
to the more restrictive notion of an inhibition viable RAF. 
However part (b) does not, and here we present a stronger 
result, which implies Theorem 1(b) (upon taking e = 0). 
The proof of this theorem is presented in the Appendix. 

Theorem 2. Consider a CRS Q that satisfies the 
extended conditions (C1)-(C3), and has a species stratifi- 
cation. Suppose further that the inhibition rate e in (C3) 
satisfies: 0 < e < exp(-Kc), where c is the average (over 
all reactions) expected number of molecular species that 
catalyse each reaction. 

• IfJZ > ^ • ^jxp ^ en ^ e probability that there exists 
an RAF for Q is at least 1 — ir(X), where 

f(X) = k ^ k ^ e -!*K -> 0 exponentially fast as 
X — > oo. 

• When € = 0 (no inhibition) the factor of 2 in the 
numerator and denominator of if (X) can be removed. 

Kinetic RAF framework 

Here we extend previous work by introducing the con- 
cept of a kinetic CRS, in which every reaction has an 
associated rate, and all molecules diffuse away into the 
environment at constant rate. We then define a kinetic 
RAF, which, informally, is an RAF in which every molecule 
is produced at least as fast as it is lost (to diffusion, or by 
consumption in other reactions). This represents the idea 
that being able to build up a sufficient local concentration 
of molecules is a necessary condition for RAFs to form. 

Definition: A kinetic CRS is a tuple Q = (X, TZ, F, C, v) 
where X, TZ, F and C are defined in the same way as for a 
simple CRS, and v : TZ R>o is a rate function, where for 
each r e 1Z, v(r) is the rate of r. 

For any subset 1Z' c TZ, the stoichiometric matrix Sk' 
is the |supp(7?/) \ F\ x \7Z'\ matrix with rows indexed by 
the non-food molecule types involved in 1Z' and columns 
indexed by the reactions in 1Z' , where S,y € Z is the net 
number of molecule type i produced by reaction The 
rate vector v K ' =[ v(ri), vfo), ■ ■ ■ , v(r\ K '\)] T lists the rates 
of each reaction in 1Z'. Then, S-r/v-r/ is a vector of the net 
rates of production of each molecule type in supp(7\!/) \ F. 
Let 8 > 0 be the diffusion rate. 

A subset 1Z' c K is a kinetic RAF (kRAF) if and only 
if the following properties hold (where 1 is a |supp(7?/) \ 
F\ x 1 column vector of Is): 

(a) TZ' is an RAF for Q; 

(b) 1Z' is closed; 

(c) The following inequality holds: 

S R/ v R < - 81 > 0. (2) 



Note that we do not include food molecules in the rows 
of Sn>. An RAF 1Z' is not guaranteed to contain any 
reactions which generate food molecules, but will neces- 
sarily contain at least one reaction with at least one food 
reactant. In that case, if we were to include the rows cor- 
responding to those food molecules, they would have only 
negative entries, causing the RAF 7Z' (which might other- 
wise satisfy the properties of a kRAF) to formally fail to be 
a kRAF. 

The diffusion rate S represents the rate at which 
molecules diffuse away into the environment. Diffusion is 
unavoidable in chemical systems, and as molecules diffuse 
away, their concentrations drop until they are no longer 
available to sustain local reactions. A CRS occurring in the 
ocean or a "pond" might have a larger <5 than one occurring 
in a hydrothermal vent, which may in turn have a larger S 
than a CRS confined within a lipid membrane [33]. 

The idea of searching for kRAFs within a kinetic CRS 
is related to the idea in chemical organisation theory 
(COT) of searching for self-sustaining chemical organi- 
sations within an algebraic chemistry [9] . The definitions 
of the stoichiometric matrix coincide, and the qualifying 
condition (2) for a kRAF is similar to the qualifying condi- 
tion for an organisation to be self-sustaining [9] (however 
in COT there is no diffusion term; note that > 
0 is necessary but not sufficient for a subset 1Z' C TZ 
to be a kRAF). Furthermore, in COT the entries of the 
vector v are not fixed - we are free to choose a set of 
values that makes the system self-sustaining, and indeed 
the definition of self-sustaining is simply that such a set 
of values can be found. In contrast, the reactions rates 
in a kinetic CRS are pre-determined constraints within 
which we can (in principle) go looking for a subset TZ' 
of reactions that satisfies (2). While we propose that this 
set up is more relevant to the origin of life, the follow- 
ing theorem shows that such a search is unlikely to be 
useful in general. We show that determining whether or 
not TZ contains a kRAF is NP-complete when 8 = 0 
(we expect a similar result applies when 8 > 0 but our 
proof, presented in the Appendix, applies to the zero- 
diffusion case). 

Theorem 3. Given a kinetic CRS Q = (X, TZ, C, F, v) 
with diffusion rate 8 = 0, the problem of determining 
whether or not TZ contains a kRAF is NP-complete. 

The closely related problem in COT of deciding whether 
or not an algebraic chemistry contains an organisation is 
also NP-complete [34]. Although Theorem 3 shows that 
we cannot hope to efficiently find kRAFs within a kinetic 
CRS, it is easy to check (in polynomial time) whether or 
not a given RAF is a kRAF, and since RAFs can be found in 
polynomial time [8], it may be feasible to discover kRAFs 
in a kinetic CRS by first ignoring the rate function v and 
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finding a sample of RAFs, then deciding whether or not 
any are viable under v. 

One weakness of the kRAF concept is that reaction rates 
are fixed - in real systems, the rate of a reaction is a 
function of the concentrations of its reactants, catalysts 
and inhibitors. Although the concept of concentration 
currently has no direct meaning in the RAF framework, 
previous work has used dynamical simulations to study 
the changes in concentrations of molecules in small RAF 
sets [15,18]. 

Conclusions 

Due to the utility of polymers in modern life, much of 
the theoretical and experimental work on the origin of life 
problem has focussed on system of polymers, and in [13] 
it was shown that the level of catalysis need only increase 
linearly as the number of molecules increases, in order 
to maintain a high probability of RAFs occurring. We 
have presented a generalisation of this result, showing that 
under mild assumptions, the same linear bound applies to 
a system in which the molecules are not necessarily poly- 
mers. Furthermore, partitioned systems were shown to 
support the development of RAFs similarly to typical sys- 
tems containing only one type of polymer, and the effect of 
the pattern of catalysis on the emergence of RAF sets was 
explored. Previous research into template-based catalysis 
[14,20] and recent work incorporating more realistic pat- 
terns of catalysis [35] have indicated that the emergence of 
RAFs is quite robust to the the structure of the underlying 
reaction system, a conclusion which this paper supports. 

This research was performed in an effort to better 
understand the "symbiotic coexistence" of peptides and 
nucleic acids in living organisms, as well as the poten- 
tial role of this reciprocity in early chemical evolution (as 
highlighted recently by [5]). While the results presented 
here are a far cry from deep insights revealing funda- 
mental truths about the origin of life, this extension of 
previous work on chemical reaction systems represents an 
incremental gain in understanding, which can hopefully 
contribute to an eventual bigger picture. In particular, this 
paper supports the experimental work of Li et al. [5] and 
encourages further experimental work on the topic. 

We have also introduced and studied two new concepts 
in RAF theory: closed RAF sets, and kinetic chemical 
reaction systems. A closed RAF set is an RAF set in the 
standard sense, with the additional property that "every 
reaction that can occur, does occur". More specifically, 
this means that if the existing subset of reactions is able 
to produce all the reactants and at least one catalyst of a 
reaction outside of the subset, then that reaction should be 
included the subset. A closed RAF is a subset of reactions 
that has "absorbed" every such reaction. 

The kinetic RAF framework was developed in response 
to criticism levelled at RAF theory for not accounting for 



the fact that reactions progress at different rates. Kinet- 
ics is a fundamental part of real chemistry, so while the 
strength of RAF theory perhaps lies in its simplicity, the 
development of a kinetic extension is appropriate. A cen- 
terpiece of previous RAF theory investigations has been 
the search algorithm from [8], which runs in polynomial 
time and which has allowed chemical reaction systems of 
various sizes and properties to be investigated computa- 
tionally [8,14]. Therefore, a similar algorithm for detecting 
kinetically viable RAFs inside a kinetic CRS would be a 
promising start for the development of a theory of kinetic 
RAFs. Unfortunately, a reduction from the NP-complete 
problem 3-SAT showed that detecting a kinetic RAF 
within a kinetic CRS is unlikely to be productive in gen- 
eral. However, it is possible to construct RAFs efficiently, 
and for each RAF found one can readily test whether it 
is also a kRAF and therefore potentially capable of true 
autocatalytic growth. 

Endnotes 

Peptide nucleic acid (PNA) does exist, however this 
polymer has a backbone of N-(2-aminoethyl)glycine 
(AEG) monomers linked by peptide bonds, with 
nucleobases attached to each monomer, rather than 
being composed of both nucleotide and amino acid 
monomers. Interestingly, the recent discovery of AEG 
production in diverse taxa of cyanobacteria may suggest 
an information-carrying role for PNA in early life [36]. 

2 tRNA aminoacylation or "charging" involves the 
esterification of an amino acid monomer to the relevant 
tRNA, prior to translation at the ribosome. This is of 
course an example of a reaction which combines 
molecules from both "independent" sets. 

Appendix 

Proof of Theorem 1 and Theorem 2 

Let c(r) denote the expected number of molecular species 
that catalyse reaction r, let c/ = min{c(r) : r e 1Z} 
and c u = max{c(r) : r e TZ] denote the lower and 
upper bounds on these values, respectively, and let c = 

\h T,r€K C W = |7^+| T,reK+ c ( r ) denote the average 
value. Then from the definition of ]I we have: 

_ _ 

u = c ■ , 

1*1 

and (C2) furnishes the three inequalities: 

c u /ci < K, c u < Kc, and q > c/K. (3) 

Next we establish the following variation on a lemma 
from [13]. 

Lemma 2. Consider a random CRS Q = (X, TZ, C, I, F), 
satisfying (C1)-(C3). For a reaction r e 1Z let q r be the 
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probability that either no species in X catalyses r or at least 
one species in X inhibits reaction r . 

(i) q r >l- c u , 

(ii) q r < exp(-c/) + e. 



Let p(x,r) = P(£(x,r)) denote the probability that 
species x catalyses reaction r, and let p'(x, r) = Y(F{x, r)) 
denote the probability that x inhibits r. Note that 1 — q r 
is the probability that at least one species in X catalyses r 
and no species in X inhibits r and so, by condition (CI), 
we have: 

1 _ qr = (i _ Y[ (1 - p(x, r))) • n (1 " P'( x < ')). (4) 
Thus, 

q r > ~ P(*' r ^ - 1 ~ J2 p ( x ' r) ' 

X€X X€X 

and J^ X€X P( x > r ) ls t ne expected number of species that 
catalyse r, which by (C2) is at most c u . Thus, q r > 1 — c u 
which establishes part (i). 
For part (ii) we have from (4): 

q r < n (i - p (x > r))) + ( 1 - n (i - p i{x ' r)) ) ■ 

xtX \ xeX I 

and since 

n ^ ~ p( x> r ^ - ex p ( ~ ^2p( x < r ) ) - ex p(~ c /) 



X€X 



X€X 



and 



1 ~ ri (1 ~P'( x ' r )) < £/(*,>•) < e 



X€X 



X€X 



(by (C2) and (C3)) we obtain the claimed inequality in part 
(ii). 

To establish Theorem 1 part (a), observe that any RAF 
must contain at least one catalysed reaction that has 
its reactant(s) in F; we call such a reaction primary. By 
species stratification conditions (i), (ii) and (SI), the num- 
ber of reverse reactions f a + b, such that / € F is 
bounded by a function of k and t; while by the species 
stratification conditions (i), (iii) and (SI), the number of 
forward reactions f + f g such that /,/' € F is also 
bounded by a function of k and t (involving the constant 
M mentioned in (iii)). Thus the total number of primary 
reactions is at most a constant r dependent only on k and 
t. By Lemma 2 (i) and condition (CI), the probability Qtz 
that none of the primary reactions are catalysed satisfies: 



Qn > (i - c u ) z 



(5) 



Now, the probability that at least one of the primary reac- 
tions is catalysed is 1 — Qtz, and this is clearly a necessary 



(but not sufficient) condition for there to be an RAF for 
Q. It follows from (3) and (5) that: 

P(3RAF forQ) < 1-Qn < l-(l-c u ) T < l-(l-c/K) z . 

Consequently, if JZ < X ■ then X < c, and so we 
arrive at Theorem 1(a), with 0(A) = 1 — (1 — X/K) T . 

To establish Theorem 2 (which implies Theorem 1(b)) , 
let q- := exp(— c/) + e. By the upper bound on e stated in 
the theorem (and the inequality c/ > c from (3)) we have: 



q- < exp(— C[) + exp(-cK) < 2exp(—cK). 



(6) 



By Lemma 2(ii), for any s > t (recalling that F = at), 
the probability that a species x € X(s + 1) cannot be pro- 
duced from reactants in a s is at most (q~) cs (since by (S2) 
we know that there exist at least cs reactions producing x 
from reactants in a s , so the only way for x to fail to be pro- 
duced is if each such reaction has either no catalyst in X 
or an inhibitor in X). 

Let N s denotes the number of species in X(s + 1) which 
cannot be produced by catalysed and uninhibited reac- 
tions from reactants in a s . Then the expected value of N s 
satisfies the inequality E[N„] < \X(s + 1)|(^_) V5 , and, by 
(SI) the term on the right is bounded above by k s+l (q-) vs . 
In particular, since Y(N S > 0) < E[A/j], the probability 
(let us call it W s+ \) that there exists at least one species in 
X(s + 1) which cannot be produced from reactants in a s 
satisfies: 



W s+1 < k'+Hq-r. 



(7) 



Let us say a species in X is problematic if each reac- 
tion producing that species is either not catalysed by any 
molecule in X or is inhibited by at least one molecule in 
X. Then the probability that there exists a problematic 
species in X is Y!?=t l w s+i> which, from (7), satisfies: 



m—1 



m—l 



m—1 



£ W s+1 <kJ2 ^(q-r 3 < k £(2£exp(-vc//<0) s , 

s=t s=t s=t 

where the second inequality applies (6). Thus the prob- 
ability that there are no problematic species in X is 1 — 
Y^7=i Ws+l- ^ lower bound on this quantity is given by: 



m—1 



m—l 



l-J2w s+ i>l-kJ2( 2l < exp(-rc/K)) s 

S=t S=t 

CO 

> 1 - kY^Qkexp{-vc/K)) s 

s=t 

k(2kexp(-vc/K)y 
~ l-2kexp(-vc/K)' 

Noting that there being no problematic species in X is a 
sufficient condition for Q to have an RAF TV (indeed one 
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with supp(72/) = X), we see that 1 - Y.7=\ w s is a lower 
bound on P(3RAF forQ). Hence taking 



k(2ke- vk/K y 
1 - 2ke- vk / K 



and noting that if(X) -»• 0 as A oo, part (b) fol- 
lows (observe that this RAF is closed, since it involves all 
molecules inX). Finally, note that when 6 = 0 the inequal- 
ity in (6) can be improved to q- = exp(— cj) < exp(-cK) 
which eliminates the factor of 2. This completes the proof. 

Proof of Theorem 3 

Firstly, given a kinetic CRS Q = (X, TZ, C, F, v) and a sub- 
set TZ' of TZ it can be checked in polynomial time whether 
TZ' is a kRAF for Q, and so the question of whether or not 
TZ contains a kRAF is in the complexity class NR We will 
show that this question is NP-complete for the case 5 = 0 
by exhibiting a polynomial-time reduction from the NP- 
complete problem 3-SAT. Suppose we have an instance 
of 3-SAT, which is an expression P written in conjunctive 
normal form involving a set Y = {y\, . . . ,y n } of literals, 
and with each clause consisting of a disjunction of at most 
three variables (a literal yj or its negation y } ). Thus we can 
write P in the form 

P = Ci A C 2 A • • • A Q with Q = Vje m yj V /eF(0 y y 

For example, P = (yiVy 2 Vy3)A(y2Vy3Vy 4 )A(y 1 Vy2Vy 4 ) 
would be an instance of 3-SAT for Y = {y\, yi, ys, y^}. 

Here T(i) and F{i) are subsets of {1, ... , n\ that describe 
which elements of Y are in Q as a literal or a negated 
literal (respectively). Since each clause has at most three 
variables, | T(i) \ + \F(i) \ < 3. We say that P has a satisfying 
assignment if there is a function S : Y — > {true, false} 
so that for each clause Q in P, there exists / € T(i) for 
which f{yj) = true or a ;' € F(i) for which f(yj) = 
false. In the example above, setting S(yi) = true, 
S(j 2 ) = S(y4.) = false, and S(ys) to be either true or 
false provides a satisfying assignment fori 3 . 

Given P we will construct a catalytic reaction system 
(X, 1Z, C), food set F, and rate function v so that Qp = 
(X, 1Z, C, F, v) has a kRAF if and only if P has a satisfying 
assignment. 

We take F = \f\, . . . ,/„}, and let 7 = {y v . . . ,y n \. Set 

X = FUYU7U { yj T :j= 1, ...,«} U fyT :j 

= l,...,H}U{0i,...,0*}U{a> > T}. 

The reactions, associated rates, and catalysts are 
described as follows: 



fj yj for each 1 < j < n; at rate k + 1; 
catalysed by yjT; 

y~j for each 1 < j < n; at rate k + 1; 
catalysed by yjT; 



(8) 



(9) 



yj + T -*■ yjT for each 1 < ;' < n; at rate 0 < e < 1/n; 
catalysed by T; 

yj + T -*■ yjT for each 1 < / < n; at rate 0 < e < 1/n; 
catalysed by T; 

yj Oi for each pair (i,j) with / € T(i); at rate 1; 
catalysed by T; 

yj 9i for each pair (i,j) with / e F{i); at rate 1; 



(10) 
(11) 
(12) 

(13) 



catalysed by T; 
01 H 1- &k T at rate 1; catalysed by T; (14) 



yj + yj w f° r eacri i — i — n < at rate k + 2-, 

catalysed by T. 



(15) 



First suppose that Qp contains a kRAF 1Z'; we will show 
that P has a satisfying assignment. Since TZ! is an RAF, 
it is non-empty. Therefore, the molecule T must be pro- 
duced, since every reaction in TZ is catalysed by either T 
or some molecule that is produced from T. This in turn 
requires that for each 1 < i < k, Oi is produced, and there- 
fore, for each 1 < i < k there exists /' e T(i) such that 
yj is produced or / € F(i) such that yj is produced. Fur- 
thermore, for each value of 1 < /' < n, at most one of the 
molecules yj,y, is produced, since otherwise by the clo- 
sure property the /th reaction described by (15) would be 
contained in TZ' , which would destroy both yj and faster 
than either is produced and violate the rate property of the 
kRAF TZ' . A satisfying assignment S for P is now provided 
by setting S(jj) to be true (respectively false) if yj is 
produced by some reaction in TZ' (respectively not pro- 
duced by some reaction in TZ'). Note that S is a satisfying 
assignment even in the case where neither oiyj,y: is pro- 
duced for some j e {1, . . . , «}, since in that case S(jj) can 
be chosen arbitrarily. 

Conversely suppose that P has a satisfying assignment 
S; we will show that Qp contains a kRAF. Let TZ' consist of 
reaction (14) together with the following reactions: 

• For each / e {1, . . . , n] such that S(jj) = true, 
include the/th reaction from (8), the/'th reaction 
from (10), and every reaction from (12) such that 
j € T(i); 

• For each j e {1, . . . , n} such that S(jj) = false, 
include the/th reaction from (9), the/th reaction 
from (11), and every reaction from (13) such that 
j e T{i). 

To show that TZ' is a kRAF, we must show that it is a closed 
RAF which satisfies the rate requirement (Equation 2). 
It is easy to see that 



Smith ef al. Journal of Systems Chemistry 201 4, 5:2 
http://www.jsystchem.eom/content/5/1/2 



Page 17of 18 



p(TZ') = cl n ,(F) 

= FU{yj: S(yj) = true) U : S(yj) = false) U 
{yjT : S(yj) = true) U fyT : Sijj) = false) U 
{9 1 ,...,0 k }U{T}, 

(16) 

so TV is F-generated. Moreover, every reaction is catal- 
ysed by exactly one molecule from the set {T} U {yjT : 
S(yj) = true} U {yjT : S(yj) = false}, and since 
this union is a subset of c1-r/(F), TV is also reflexively 
autocatalytic and is therefore an RAF set. 

TV is closed if there are no reactions re TZ\TV such 
that there exists (x, r) e C with {x} U p(r) C c\ n >(F). 
By the construction of TV, TZ\TV contains the following 
reactions: 

• fj ~* Jj f° r eacn ) suc h ^at S(yj) = false 
(catalysed by yjT); 

• fj Jj f° r eac h 7 suc h that = true 
(catalysed by yjT); 

(the catalysts of these reactions are not contained in 

dn>(F)) 

• jj + T -> for each ;' such that S( yj) = false; 

• y~j + T -> j/ ; T for each ;' such that S(yj) = true; 

• yj ft' f° r eacri P a i r (*>/) w ith /' € T(0 and S(yj) = 
false; 

• yj 6j for each pair with / e F(z') and S( j ; ) = 
true; 

(Other than T, the reactants of these reactions are not 
contained in cl^/ (F)) 

• yj +yj a> for each j e {\,...,k}. 

For each value of /, exactly one of the two reactants of this 
last reaction is contained in cln'(F)- Hence, TV is closed. 

It remains to show that TV satisfies the rate condi- 
tion from section "Kinetic RAF framework". Recall that 
the rows of the stoichiometric matrix are indexed by the 
molecules in the set 

suw(R!)\F = c\ n ,{F)\F, 

the elements of which are given by (16). 

The molecules {yj : S(yj) = true} are each produced 
at rate k + 1 from fj, used up at rate e > 0 to produce 
yiT, and used up at rate 1 by each of the reactions {yj — > 
Oi : j e F(i)}. Since there are k clauses, there are at most 
k values of i for which /' € T(i). Hence the overall rate of 
production of each molecule yj is at least k+1— (k + e) = 
1 — 6 > 0, which satisfies the rate condition. A similar 
argument can be made to show that the molecules {yj : 
S0/) = false} also satisfy the condition. 



The molecule T is produced at rate 1 by the reaction 

B\ H hftc — > T, and used up at rate 0 < e < 1/n by each 

of the n reactions forming yjT or y,T. Hence the overall 
rate of production of T is guaranteed to be positive. 

Consider the molecules B\, . . . , 9^. Oi is produced at rate 
1 by each reaction from (12) or (13) that is included in 
TV , of which there are at least one (since P has a satisfying 
assignment). 0; is also used up at rate 1 by reaction (14), 
hence the overall rate of formation of is non-negative. 

Finally, noting that the molecules yj T and y, T are all pro- 
duced at rate e > 0 and are not used by any reaction, we 
see that every molecule in supp(R')\F is produced at least 
as fast as it is used up. This shows that TV is a kRAF, and 
so completes the reduction. 
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